Closes #22. Added a test that the memory usage doesn't balloon.#23
Conversation
Also added some other supporting code, like a CI workflow to run the test on a schedule, and also the one-off script that I used to help measure resource consumption of the code.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #23 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 5 5
Lines 808 803 -5
Branches 117 116 -1
=========================================
- Hits 808 803 -5 ☔ View full report in Codecov by Sentry. |
va7eex
left a comment
There was a problem hiding this comment.
I have some suggestions that are completely optional, otherwise I approve.
| args = parser.parse_args() | ||
|
|
||
| resource_summaries: list[ResourceSummary] = [] | ||
| sample_regex = re.compile(r"^.*/(.*)\.BA\.txt$") |
There was a problem hiding this comment.
I'd probably add this regex to the other regexes defined above.
| parser.add_argument("input_dir", help="Directory to scan for HLA sequences") | ||
| parser.add_argument("--output_csv", help="CSV file summary", default="out.csv") |
There was a problem hiding this comment.
I would recommend these be both of type Path
https://docs.python.org/3/library/argparse.html#type
You could be a bit more explicit and type the directory as "a directory", see this example: https://stackoverflow.com/a/51212150
| for exon1_filename in glob.glob(f"{args.input_dir}/*.BA.txt"): | ||
| sample_name: str = sample_regex.match(exon1_filename).group(1) | ||
| exon2_filename: str = os.path.join(args.input_dir, f"{sample_name}.BB.txt") | ||
| with open(exon1_filename) as f: | ||
| exon1: str = f.read().strip() | ||
| with open(exon2_filename) as f: | ||
| exon2: str = f.read().strip() |
There was a problem hiding this comment.
Typing with Path, this could become
for exon1_filepath in args.input_dr.glob("*.BA.txt"):
sample_name: str = sample_regex.match(exon1_filepath.name).group(1)
exon2_filepath: Path= exon1_filepath.with_name(exon1_filepath.name.replace("BA.txt", "BB.txt"))
exon1 = exon1_filepath.read_text().strip()
exon2 = exon2_filepath.read_text().strip()
...
json_filepath = args.input_dir / f"{sample_name}.json"
json_filepath.write_text(json.dumps(json_input))
...
result = subprocess.run(
[
...,
json_filepath.as_posix(),
]Using David's suggestion from review. Co-authored-by: David Rickett <25559687+va7eex@users.noreply.github.com>
Also added some other supporting code, like a CI workflow to run the test on a schedule, and also the one-off script that I used to help measure resource consumption of the code.